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Abstract 



We describe a universal information compression scheme that compresses any pure 
quantum i.i.d. source asymptoticaUy to its von Neumann entropy, with no prior knowl- 
edge of the structure of the source. We introduce a diagonalisation procedure that 
enables any classical compression algorithm to be utilised in a quantum context. Our 
scheme is then based on the corresponding quantum translation of the classical Lempel- 
Ziv algorithm. Our methods lead to a conceptually simple way of estimating the entropy 
of a source in terms of the measurement of an associated length parameter while main- 
taining high fidelity for long blocks. As a by-product we also estimate the eigenbasis 
of the source. Since our scheme is based on the Lempel-Ziv method, it can be applied 
also to target sequences that are not i.i.d. 

1 Introduction 

In addition to its evident utility in practical communication issues, the concept of informa- 
tion compression provides a bridge between the abstract theory of information and concrete 
physics ~ it characterises the minimal physical resources (of various appropriate kinds) that 
are necessary and sufficient to faithfully encode or represent information. 

In the case of quantum information, the study of optimal compression rates is espe- 
cially interesting as it relates directly to non-orthogonality of states [||] and entanglement, 
and thus provides a new tool for investigating foundational properties of these uniquely 
quantum features. Almost all work to date on quantum information compression has stud- 
ied compression properties of a so-called independent identically distributed source (i.i.d. 
source). (However see fl^ for an interesting non-i.i.d. situation.) Let £ = be 
an ensemble of (pure) quantum signal states \ai) with assigned probabilities pi. An i.i.d. 
source comprises an unending sequence of states chosen independently from £. For each 
integer n we have an ensemble of signal blocks of length n. Writing I = ii . . .in the states 
are \ai) = |<Tj^) ® . . . Cg) |c7j„) with probabilities pi = Pii ■ ■ ■Pi„. Let Ti (with dimension d) 
denote the Hilbert space of single signals and let Qa denote the space of all mixed states of 
a qubits (or the smallest integer greater than a if a is not an integer). Then n-blocks \aj) 
are in "H®" and in Qniogd- (In this paper log will denote logarithms to base 2). To define 
the notion of compression we first introduce the fidelity 



between any pure and mixed state. More generally if p and cu are mixed we define fidelity 
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by in|,|ii 




(2) 



The von Neumann entropy S of an ensemble £ is defined by 



S = — tr plog p 

where p = Y2iPi \ is the overall density matrix of the signals. 

An encoding-decoding scheme for blocks of length n, to a qubits per signal and average 
fidelity 1 — e, is defined by the following ingredients: 

(i) An encoding operation En '■ "H®" — > Q„q, which is a completely positive trace preserving 
(CPTP) map. £'„(|cr/)) is a (mixed) state of na qubits called the encoded or compressed 
version of \crj). 

(ii) A decoding operation Z)„ : Qna Qniogd which is also a CPTP map. We write 
aj = DnEn{\crj)) and call it the decoded version of \(7j). Note that dj is generally a mixed 
state. 

(iii) The average fidelity between \aj) and aj is 1 — e: 

^piF{\ai),ai) = l-e 
I 

We say that the source £ may be compressed to a qubits per signal if the following condition 
is satisfied: for all e > there is an no such that for all blocks of length n > tiq there is an 
encoding-decoding scheme for blocks of length n to a qubits per signal and average fidelity 
at least 1 — e. 

The above definitions are motivated by source coding for i.i.d. sources in Shannon's clas- 
sical information theory (cf for an exposition). Indeed if the signal states are mutually 
orthogonal and the coding/decoding operations are classical probabilistic processes, then 
we regain the standard classical theory. The quantum generalisation of Shannon's source 
coding theorem is Schumacher's quantum source coding theorem |l^, 16], stating 



that the optimal compression rate is the von Neumann entropy S of the signal ensemble. 
More precisely, if a 7^ then £ may be compressed to a qubits per signal iff a > 5. In 
these source coding theorems it is assumed that we have knowledge of the signal ensem- 
ble states |fTj) and their prior probabilities pi. (Actually knowledge of the density matrix 
p = J2iPi alone suffices). 

The question of universal compression concerns a situation in which we have only partial, 
or even no knowledge, about the i.i.d. source £. We may even go further and ask about 
compressing a target sequence from a source that is not even assumed to be i.i.d. (but 
perhaps has other properties e.g. a Markovian source). Thus universal compression may 
be studied in the presence of varying degrees of prior knowledge about the source. In this 
paper we will consider universal compression of i.i.d. sources. However in contrast to all 
other quantum compression schemes proposed to date, our methods can also be applied 
to non-i.i.d. sources in a natural way, in various situations (that will become clear in our 
exposition below). 

A classical i.i.d. source is fully characterised just by its probability distribution {pi} 
of signals. In a quantum i.i.d. source, for the purpose of studying the action of the 
encoding and decoding maps, each signal state may be taken to be in the mixed state 
p = ^iPi\o'i){ai\. Hence a quantum i.i.d. source is fully characterised by the classical 
probability distribution {Aj} of the eigenvalues of p together with the specification of a 
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corresponding orthonormal eigenbasis {|Aj)}. The distribution {Aj} is the direct analogue 
of the distribution {pi} of a classical source and the extra freedom in the quantum case, 
of the orientation of the eigenbasis makes the problem of universal quantum compression 
inherently more difficult than its classical counterpart. 

Before presenting our main results we give a brief overview of existing work on universal 
quantum information compression. The basic technique of so-called Schumacher compres- 
sion used in the Schumacher source coding theorem utilises the typical subspace 
of p. This construction requires knowledge of both the eigenvalues and eigenvectors of p. 
As such, it does not appear to offer any generalisation to a universal compression scheme, 
with a prior knowledge of anything less than full knowledge of the source. In |17| Jozsa 
et al. presented a universal compression scheme for quantum i.i.d. sources, requiring a 
prior knowledge of an upper bound 5*0 > S* on the von Neumann entropy of the source but 
requiring no prior knowledge of the orientation of the eigenbasis of the source. The scheme 
compressed the source to 5*0 qubits per signal (in contrast to the optimal S in the case 
that the source is known). Hence if the von Neumann entropy (or set of eigenvalues) of the 
source is known then this scheme is universal, with no prior knowledge of the eigenbasis. 

In Q and a quantum analogue of Huffman variable length coding was developed. 
The techniques of Schumacher and Westmoreland in motivated the formulation of our 
two stage compression model below. Although the Schumacher-Westmoreland scheme is 
not presented as a universal scheme, if we adjoin the method of "smearing" measurements 
used by Hayashi and Matsumoto 0, |8| (and also described and used by us below) then the 
scheme can be made universal for a situation in which the orientation of the eigenbasis is 
known but the eigenvalue distribution is unknown. 

The first fully universal compression scheme for quantum i.i.d. sources was presented 
by Hayashi and Matsumoto 0. This scheme compresses any quantum i.i.d. source to its 
von Neumann entropy S, requiring no prior knowledge of the eigenbasis or eigenvalues of 
the source density matrix. Their method is based on the scheme of supplemented by 
an estimation of the eigenvalues of the source and hence of S. 

In this paper we present an alternative fully universal quantum compression scheme 
with various novel features In classical information theory there exists a variety of 
schemes for universal classical information compression. Some of these, such as the Lempez- 
Ziv method, apply even to situations in which the target string does not come from an i.i.d. 
source. Below we will introduce a "diagonalisation procedure" which effectively enables 
any such classical scheme to be transferred into the context of quantum compression. Then 
utilising the measurement smearing technique of 0], in conjunction with a further iterative 
procedure, we will achieve universal quantum information compression. In contrast to 
the scheme of our scheme will include an estimation of the eigenbasis orientation, and 
we estimate the source entropy S via a conceptually simpler estimation of a block length 
parameter, whose knowledge is equivalent to that of S. Our scheme will be based on 
transferring the classical Lempel-Ziv scheme into a quantum context. Hence (in contrast 
to any other existing scheme) it will be applicable even to sources that are not assumed to 
be i.i.d., although the question of optimality of the achieved compression rate in these more 
general situations remains to be explored. 
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2 Two-Stage Compression Model 



Let £ = {\ai) ] pi} be the signal ensemble of a quantum i.i.d. source and let p = Y^- pi \ ai) (o"i| . 
In the signal state space let {|ej)} be a fixed chosen basis called the computational basis. 
The process of compressing blocks of length n of the source will in general consist of some 
unitary manipulations of the state p*^" (and we also include the possible adjoining of ancil- 
lary qubits in a standard state), and some operations which are not unitary (i.e. discarding 
qubits or measurement operations). We know from the study of quantum circuits |Q] that 
the order of operations can be re-arranged to put all the non-unitary steps at the end. Thus 
the whole procedure can be naturally divided into two separate stages. The first, consisting 
of all the unitary manipulations of the state, will be called the "condensation" stage. This 
stage takes the form of an algorithm to be performed by a quantum computer. Naturally, 
it cannot decrease the total length of the sequence. 

At the end of the condensation stage, the string should have been manipulated in such 
a way that the first nR^ qubits of output contain a faithful representation of the data in 
the input (for some "condensation rate" S{p) < Rc < !)• The remainder of the qubits are 
in a state asymptotically independent of the input (for simplicity, we assume the state |0)). 
We call these qubits "blank" . 

The second stage consists of the measurement operations. This is where actual com- 
pression takes place, since now the dimension of the Hilbert space in which the state lives 
can be reduced. It is called the "truncation" stage, since it tends to involve removing the 
"blank" qubits at the end. In truncation the length of the string is reduced to nRt for 
some Rc ^ Rt ^ ^- Asymptotic independence between the "blanks" and the data qubits is 
equivalent to saying that this truncation can be performed with fidelity F — > 1 as n — > oo. 

Determining exactly where the truncation cut is to be made is in general a difficult task. 
Previous compression schemes have relied upon given prior knowledge of the source to do 
this. In the present case we do not assume that such information is given a priori. 

3 Lempel-Ziv Algorithm 

For the condensation stage of our compression scheme we will utilise the basic formalism of 
the classical universal Lempel-Ziv compression scheme, transferred to a quantum context. 
The precise details of the Lempel-Ziv method will not be required but we give an outline 
of the method. In fact any other classical scheme that is universal for classical i.i.d. sources 
could be used. 

The classical Lempel-Ziv compression scheme [§, |l^ asymptotically compresses the out- 
put of an i.i.d. source with unknown probability distribution to H bits per signal, where H 
is the Shannon entropy of the distribution. It depends upon the fact that at any time in 
the decoding process, there is a significant quantity of data that is known to both sender 
and receiver. By making reference to this data as a shared resource, the sender can more 
efficiently transmit further signals from the same source. The encoder scans the sequence, 
building up a dictionary of subsequences in such a way that each new entry in the dictionary 
is a 1-bit extension of some previous word. When the whole sequence has been parsed in this 
way, this internal structure of the dictionary is transmitted as a list of references; instead 
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of sending a whole subsequence, its position in the dictionary is transmitted, plus the single 
extra bit. In the limit, these references are logarithmically shorter than the subsequences 
they represent. The decoder reverses this procedure, building up the encoder's dictionary 
from the list of references. The original sequence is reconstructed simply by concatenating 
the words of the dictionary. The Lempel-Ziv code is therefore lossless (i.e. has fidelity 1) 
and the compression rate H is achieved as an average value over all possible inputs. 

Bennett |5[ showed that it is possible to implement any classical algorithm reversibly, 
with only a polynomial increase in time and space resources. We can therefore construct a 
reversible version of the Lempel-Ziv algorithm (or any other classical universal compression 
algorithm), which may be run on a quantum computer. 

The resulting algorithm treats the orthonormal states of the computational basis as if 
they were classical signals. For sources which are not diagonal in the computational basis 
the input sequence can be regarded as a superposition of "pseudo-classical" sequences, each 
of which is operated upon independently. However as we will show below, the action of the 
quantum implementation of the Lempel-Ziv algorithm on such non-diagonal i.i.d. quantum 
sources is simply to condense them to a rate asymptotically approaching H qubits per 
signal, where H > S is the Shannon entropy of a suitable probability distribution. This 
feature of H being generally greater than S, embodies the difficulty arising from a mismatch 
between the eigenbasis of the source and the computational basis of the computer. As part 
of our main result we will show how this difficulty can be overcome. 

In order to help us examine the effect of running the algorithm, we note that the quantum 
implementation (via Bennett's result) of any classical deterministic algorithm merely enacts 
a permutation of the set of all strings of computational basis states at each step. That is, 
each state \ei) = |ei^) . . . (81 |ej„) is mapped to some |ep(7)), where P is a permutation on 
the set of sequences / = zi . . . i„; no superposition or probabilistic mixing is created. This 
action, denoted: 

|e/) — > \ep{i)) (3) 
is much more restricted than that of an arbitrary unitary transformation: 

J 

and the restriction will be important for us later (cf Theorem 1 below). 

4 Condensation rate with mismatched bases 

If we knew the eigenbasis of the source density matrix, we could simply set the computer 
to use this basis as its computational basis, and analysis of the output from the algorithm 
would be relatively simple. But since we are aiming to achieve fully universal compression, 
we must assume that we do not know the source's eigenvectors. 

We denote the computational basis by Be = and denote the eigenbasis of the 

source by Bs = Our first task is to study the effect of the algorithm on an i.i.d. 

source whose eigenbasis Bs does not coincide with the computational basis Be- We do this 
by introducing a hypothetical diagonalisation procedure which has the effect of making the 
source appear diagonal in the computational basis. 
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4.1 The Diagonalisation procedure 



Given any orthonormal basis B 



{\i)} with n states, we define 



D 



i) \j) 



(4) 



where © denotes addition mod n. This operation is commonly used in quantum information 
processing, particularly in the special case of j = 0: 



where it serves as a duplication operation. Note that this only copies basis states, and not 
superpositions of them - there is no conflict with the No-Cloning theorem. The action of D 
on a superposition is an entangled state: 



This is a fatal feature for those who would like to clone quantum information, but it will 
be the key to solving our problem of mismatched bases. 

If we apply D to each signal state in the input sequence, we will produce a duplication 
(relative to the basis B) of this input sequence, entangled with the original. If we allow the 
computer only to operate on the original, or only on the duplicate, and not to make joint 
operations on both, then the state addressed is described by tracing out one of the systems. 
From the RHS of (|) we see that the reduced state of either system is |ajp which 
is always diagonal in the basis B. 

If we take B to be the computational basis of the computer and only address each 
part of the duplication separately, the computer will act on an input that is diagonal in 
its computational basis. Note that everything so far is done coherently. Although the 
computer is now addressing a mixed state, no measurements have been carried out, and no 
information has been lost. 

Now, imagine allowing the computer to address each part of the duplication in turn; 
it carries out the algorithm on the first sequence, leaving the duplicate unchanged, then 
repeats the process on the duplicate, leaving the first part undisturbed. Alternatively, we 
can imagine building a computer twice as large, partitioned into two sides, each of which 
works simultaneously on a single copy of the sequence. Finally apply at each signal 
position in the resultant state. 

We will show below that this combined process will leave a final state in the first register 
that is identical to the result of simply applying the algorithm to the given input sequence 
with no diagonalisation operations being applied. Since in the alternative process (involving 
the operation D) the algorithm acts only on states diagonal in the computational basis, we 
can use this equivalence to give a simple derivation of the condensation properties that 
result when the computational basis and eigenbasis of the source are not matched. 

Consider any classical deterministic algorithm which has been formulated in a reversible 
way and implemented on a quantum computer. Thus any step C of the algorithm is a 
permutation of the computational basis states, in the sense of Eq. (|3|). We wish to prove 



D : \i) |0) ^ \i) \i) yieB 




(5) 
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that the fohowing diagram has a "pseudo-commutation" i.e. that the two ways of going 
around the loop give the same result: 



|e/)|0) |ep(,)>|0) 



D 



\ei) 



D- 



c®c 



|ep(/)> |ep(/)) 



where D = D®", the operation D applied at each position in the sequence. 



Theorem 1: For any unitary operation C, the equality D ^{C ® C)D = {C ® I) is 
satisfied iff C enacts a permutation on the computational basis states. 

Proof: If we write C as the unitary operation U : \ei) ^ hjj |ej), we get: 

|0) ^ hijbiK |ej) \eK) 

JK 

— > bijbiK |ej) \eK G ej) (6) 

JK 

where Q denotes subtraction mod n. For this to be of the form bij \ej) |0),it is necessary 
that {ck) and |ej) are always identical i.e. U must map each |e/) to a multiple of a unique 
|ej) (and not a superposition): 

\ei) ^ e^"^ \ej) . 

Finally substituting this form of U into Eq. (P) and equating with UGl \ ei) |0) gives aj = 0. 
Hence U must be a permutation as claimed. Conversely if C is a permutation then an easy 
calculation shows that D~^{C G C)D = (C /) holds. ■ 

The final state arrived at is the same whether we insert the diagonalisation operation or 
not. The duplication process can therefore be considered merely a mathematical convenience 
to save us from having to directly analyse the action of the condensation algorithm on non- 
diagonal inputs. 



4.2 Rate of Condensation 

In the special case of Be = Ss, the computer views the source as emitting classical signals 
{|ej)} with probability distribution {pi}, where the pi are simply the eigenvalues of the 
source density matrix. Thus the asymptotic condensation rate is the von Neumann entropy 
of the source, S{p). Therefore, in this special case, the Lempel-Ziv algorithm achieves 
asymptotically optimal condensation. 

If Be 7^ Bs then by Theorem 1 the condensation effect of the algorithm acting directly 
on the source £ can be found by considering the reduced state of the duplication of £. 
The rate achieved will be the same as the condensation rate of a source diagonal in the 
computational basis Be, but whose signal probabilities are given by the eigenvalues of this 
reduced density matrix. 
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If Aj and |Aj) are the eigenvalues and eigenvectors of £ then we can expand the eigen- 
vectors in the computational basis {\ei)} as 

|Aj) = l^i) 
j 

and the source density matrix becomes 

p = ^ Ai = ^ \aija*k \ej){ek\ (7) 

i ijk 

When we append the ancilla and apply D, we get 

P -^^>^iaija*ik |ej)(efcU ® |ei)(efc|s (8) 

ijk 

and taking the partial trace over system B gives the reduced density matrix: 




This density matrix, which is diagonal in the computational basis, describes the distri- 
bution of states addressed by the computer if the duplication process is carried out. We 
therefore refer to p' as the effective source density matrix. 

The eigenvalues pj of p' arc thus given by pj = A j | aij \ ^ , where the aij arc the 
coefficients in the expansion of source eigenstate |A). We may write these coefficients as 
{ej\Xi), and therefore the j*'* eigenvalue of p' is: 

J2Xi{ej\Xi){Xi\ej) = {ej\p\ej) 

i 

i.e. the diagonal matrix elements of p when p is written in the computational basis. Inserting 
this expression into the formula for Shannon entropy gives: 

Rc{p, Bc) = -Y^ {ej\ p \ej) log {ej\ p \ej) (10) 

\ej)€Bc 

This is the general formula for the asymptotic rate of condensation for the source £ achieved 
when we work in an arbitrary computational basis Be- In the special case Be = Bs, the 
formula reduces to S{p), as expected. Furthermore since the matrix [aij] represents the 
transition between two orthonormal bases, it is a unitary matrix and then [rjk] = [lofejP] is 
doubly stochastic. Consequently the eigenvalues pj = fjj^Xk of p' are a doubly stochastic 
transform of the eigenvalues of p, and by a monotonicity theorem for entropy we have 
S{p') > S{p). Thus the algorithm with a mismatch of bases acts as a standard condensation 
process but incurs a loss of optimality of the achieved condensation rate as quantified by 
the above formulae. 
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5 Truncation 



In the truncation stage we apply the measurements which actually reduce the physical 
resources occupied by the quantum information. The difficulty of truncation is in identifying 
how many qubits are "blanks" and may safely be discarded. In doing this, we are effectively 
estimating the condensation rate achieved, Rc, which depends upon p and is therefore not 
known a priori. 

Before treating the truncation problem itself, we will first consider a simpler idealised 
situation. We are presented with a sequence of n qubits, and told that it is composed of 
two parts: the first part is a sequence of k maximally mixed qubits (each in the state |), 
for some integer < A; < n, while the remaining n — k qubits are in the zero state |0). Our 
task is to determine the value of k as accurately as possible, in such a manner that, when 
n — > oo (keeping the fraction ^ constant) the global fidelity of the state remains arbitrarily 
high, F — > 1. 

We can argue that the state described above resembles the output from the condensation 
algorithm. The output state does consist, approximately, of two distinct substrings: a 
"data" part to be preserved and a "blank" part, whose qubits are all in the state |0) (except 
for a small "tail" at the end of the data.) The size of the "data" section is an unknown 
fraction ^ of the total length n; for large n, this fraction is approximately equal to Rc{p,Bc), 
which is independent of n. 

Bearing in mind these approximations, we can assert that a solution to the simplified 
problem will get us most of the way to a solution of the truncation problem itself. After- 
wards, we will weaken the assumptions to something more realistic, whilst preserving the 
solution. 

Following 1^] we define a projector 



which acts on a sequence of n qubits, projecting onto the subspace in which the last (n — /) 
qubits are in the state |0). To locate the position k of the boundary we will develop a 
strategy that involves applying a sequence of (suitably smeared) 11; 's with decreasing I 
values. 

If we apply this projector to the sequence at some position I, it will (in general) tell us 
whether the "boundary" position k lies to the left or to the right of /. If the projector is to 
the right of the boundary then it will certainly project into its positive subspace, and cause 
no disturbance. However, if it lies to the left of the boundary then it will certainly cause 
some disturbance, whichever outcome is obtained. The closer to the boundary it lies, the 
greater the disturbance caused. The probabilities and magnitudes of disturbance depend 
upon the number of maximally mixed qubits to the right of the projector, which we denote 
s. The projector is then 11/ = Ilk-s- 

If a projector ILk-s projects to its positive subspace (giving outcome "1"), then all qubits 
to the right of the projector (including those that were maximally mixed) are set to the zero 
state. Since s qubits were maximally mixed, the probability to project to |0)®* is simply 
2~*. The result of this disturbance is so great that, as an approximation, we can assign a 
fidelity of zero to the resultant state. This is a valid thing to do, since we are only looking 




(11) 
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for a lower bound on the fidelity achieved. We will also need to derive an upper bound on 
the probability of such an "error" in our procedure. 

Conversely, ifllk-s projects to its perpendicular subspace (outcome "0"), the maximally 
mixed qubits to the right of the projector are projected away from zero. This is also a 
disturbance, although in general a smaller one. The probability of this is (1 — 2~'^). Using 
Eq. (|2|) the fidelity after the disturbance is readily seen to be (1 — 2^'^). The outcome of the 
projector tells us that we have found a position to the left of the boundary, and no more 
projections should be made - the process terminates. 

Therefore, a projector located s places to the left of the boundary maintains a fidelity 
of at least (1 — 2~'^)^, and contributes 2~'^ to the error probability (i.e. the probability of 
not registering the presence of the boundary). 

No strategy in which we simply make measurements with 11; projectors can safely give 
us the information required. Wherever we choose to apply the projectors, there can always 
be cases of I values which lie very close to the left of the boundary value k (i.e. s is very 
small), and therefore cause a large disturbance. Admittedly, the probability of such an 
event is low if k is chosen at random. That is, for most values of k (i.e. most sources) such 
a strategy would work well. However, probabilistic success is not good enough for universal 
compression, which must give high fidelity for all sources, not just the majority of them. 

To avoid this problem we adopt a method of "smearing" measurements that was used 
by Hayashi and Matsumoto 0, |. We define a POVM using an equally-weighted average 
of a set of projectors Hi, each offset by a different amount from a common basepoint: 



where Y, the number of projectors in each POVM "cluster", is some parameter we are free 
to choose. The POVM elements are then {5^,/ — 11;}. Physically we may interpret this 
POVM as applying a random choice, 11^, of the projectors 11;, ... , Xl^+y (chosen with equal 
prior probabilities 1/Y) and then forgetting the value of m. Thus for any given k value and 
any choice of I, the probability that Hm is close to k (and hence causes a large disturbance) 
is only 0{1/Y) which can be kept small by choosing Y large enough. 

We will use POVMs based on IIxL for some integer L and x G {0, 1, . . . , (i.e. moving 
in steps of L). For simplicity we assume L > Y. We apply the POVMs in decreasing order 
(ie starting with the one furthest to the right). 

If all the projectors in the cluster are to the right of the boundary value k, then whichever 
one is chosen it is sure to project into its positive subspace, since all the qubits to the right 
of that position are certainly zeroes. Thus no disturbance is caused, and an outcome of "1" 
is guaranteed. Given this outcome, we move to the next lower POVM (ie decrease n by 
one), and measure again. However, these measurements only provide upper bounds on the 
value of k. To obtain a lower bound on k as well, we must make a measurement in which 
some projectors in the cluster lie to the left of the boundary, in which case some disturbance 
is inevitably caused. 

The expected disturbance depends on k only through the value of k (mod L). It is clear 
that, since the action of projectors to the right of the boundary is entirely deterministic and 
non-disruptive, we would observe just the same success rate if the value of k were decreased 
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or increased by L - there would simply be one more or one less POVM applied to the right. 

We can therefore subsequently ignore the value of k itself, and consider only k (mod L), 
which we denote K. In principle we must consider each case individually, from K = 
through to K = L — 1, but we find that they divide up into two classes. 

5.1 Disturbance Bounds 
Fidelity when K >Y 

When K >Y,we know that all the projectors in the POVM lie to the left of the boundary. 
The fidelity in this case is therefore an average over the behaviour of each of these projectors. 

The separation s ranges from K, when the leftmost projector is chosen, to K — {Y — 1), 
when the rightmost projector is chosen. The average is therefore: 



It can easily be seen that the above argument depends only on the fact that all projectors 
in the POVM are to the left of the boundary, and therefore holds for any value of K >Y. 
Expanding the square and neglecting the small squared term we get a simple lower bound 



in this case. 

Fidelity when K <Y 

In those cases where K <Y, the above argument does not go through. The boundary now 
lies within the "cluster" . Some of the projectors lie to the left of the boundary, but there 
are others on the boundary and to the right. The projectors to the left can be treated in 
the same way as above: each has probability y of being chosen; s ranges from K (for the 
leftmost projector) to 1 (for the projector immediately to the left of the boundary). This 
gives the first term in the formula below. 

When a projector to the right of the boundary is chosen, it will give the outcome "1" 
with certainty, and we will move to the next lowest POVM, whose projectors lie around 
L places further to the left. All these projectors are therefore to the left of the boundary 
(since L > Y). The fidelity contributed in this case is the average over that cluster, which 
is given in the same way as in the K > Y case, but now with s running from + L to 
K + L — (Y — 1). Finally, there are Y — K projectors to the right of the boundary, each 
with probability y of being chosen, which gives the weight on the second term. 

Putting this all together gives us: 




Y-l 



(13) 



FiK>Y)>l-^ 



F(i^<y) = 1^^(1-2-^)2 




(14) 
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As in the previous case we expand the squares and neglect the small square terms: 

F{K < Y) 

Y Y^ ' 



- (^) - (^) - ^) 



Re-arranging gives: 

'Y -K 



FiK<Y)>l~y 



(1-2 



Y J \ 2^+^ 



and so 

F{K<Y)>l-y (15) 

Thus the same lower bound on fidelity applies for all possible values of K, and can be 
considered a worst case value for fidelity. 

Therefore, if we choose Y large enough (recalling that n can be arbitrarily large), we 
can always guarantee that the fidelity of the truncation process is greater than 1 — e for any 
given e > 0. 

Error probability when K >Y 

We saw above that the probability of a projector 11; at position I = k — s projecting into its 
positive subspace, and thus setting s maximally mixed qubits to zero, is Peis) = 2^*. We 
can therefore write the expected error probability (averaging over choice of projector) for 
any value of K. 

When K >Y, and the boundary lies outside the cluster, we simply average Pe{s) over 
the projectors in the cluster: 

Pe{K>Y) = ff2-i--^ = U'-^) (16) 

Therefore we have an upper bound on the error probability in this case: 

PeiK >Y)<^ (17) 

Error probability w^hen K <Y 

When the boundary lies within the cluster, an argument similar to that used for fidelity 
applies. The projectors in the cluster which lie to the left of the boundary each have 
probability y of being chosen, so we average over their contributions. Additionally, we 
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have a probability of choosing one of the projectors which he to the right of the 

boundary; if this happens, we move to the next POVM to the left, and average over the 
error probabilities for the projectors in that cluster. This gives us: 

1 ^ 

Pe{K<Y) = -Y,Pe{s) 

s=l 

Y-K\ 1 



+ F^^^^^ + ^"^^ ^^^^ 

^ 1=0 

Now expanding the sums and re-arranging: 



s=l ^ ^ i=0 

so 

Pe{K < y) < ^ (19) 

and we have a worst case error probability for all possible values of K. If we can make Y 
large enough, the probability that we will erroneously project a qubit to |0) can be made 
smaller than any given e > 0. 

Uncertainty in k 

We now determine how much information wc can expect to obtain from this procedure. Wc 
continue to apply POVMs until a measurement gives the "0" outcome, saying that we have 
found a subsequence that is not all zeroes. When this result is obtained, (and assuming 
that we have not made an error), we know the range of values that k could have. 

Given a "0" outcome from some POVM, the boundary cannot lie to the left of this 
POVM (or else it would certainly have given a "1" outcome instead). Also, it can lie no 
further to the right than the rightmost edge of the previous POVM - for if the boundary 
were there, the previous POVM would already have given a "0" outcome (or else made an 
error) . 

The position of k is therefore constrained to the L^Y positions between the outermost 
edges of the last two POVMs. That is as much information as we can obtain. 

The constraints of high fidelity and low error probability require us to make Y large. 
However, the only constraint on L is the initial assumption that L > Y (and it is easy 
to see that removing this constraint does not reduce the uncertainty). We therefore pick 
L = y + 1, and so have a final uncertainty in k of the order of TY . Thus given any fixed 
e > we can choose Y = O(^) (independent of n) and hence for all suitably large n we 
can learn the value oikjn with an uncertainty of TYjn (which tends to as n — oo) while 
maintaining a fidelity of 1 — 0(e) and error probability 0(e). 
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5.2 Weakening the Assumptions 

In the simplification we assumed that each qubit in the first section was maximally mixed, 
and every qubit in the second section was in state |0); in reality, this will not be the case. 
These assumptions were not used directly in the argument, but only to put bounds on 
the behaviour of the fidelities and projector probabilities. We can therefore weaken the 
assumptions to something more realistic, whilst preserving the inequalities derived. 

We assume that there exists a number k such that: 

1. A projector Hk+s to the right of that position has a probability to project into its 
perpendicular subspace that decreases exponentially with s; the drop in fidelity when 
this happens decreases exponentially with s. 

2. A projector Hk-s to the left of that position has a probability to project into its 
positive subspace that decreases exponentially with s; the drop in fidelity when this 
happens decreases exponentially with s. 

These assumptions effectively say that the components with lengths significantly greater 
or less than Rc have exponentially small probability, which we would expect to be a property 
of classical i.i.d. sources (and even some non-i.i.d. sources) condensed by a classical universal 
scheme such as the Lempel-Ziv scheme, and hence of our condensation scheme too. With 
these weaker assumptions in place, the derivations of the bounds on fidelity and error 
probability can be repeated. 

Thus the procedure defined in this section allows us to truncate the condensed quantum 
sequence, removing virtually all the "blank" qubits (except for a small constant number 
independent of the sequence length n), whilst leaving the fidelity bounded toward unity. The 
procedure does not depend upon any prior knowledge of the source, and can be applied to 
any sequence for which one can make the above assumptions - in particular, those sequences 
which are produced as output from our condensation algorithm. 

6 Learning the Eigenbasis 

Combining the Condensation and Truncation procedures described above, we see that, with 
knowledge of the eigenbasis of the source density matrix, but no information about its 
eigenvalues, optimal compression to S{p) qubits per signal can be attained asymptotically. 
For large enough n the overhead in the achieved rate, ^ can be made smaller than any 
prescribed (5 > 0. 

However, without knowing the source eigenbasis we can only compress asymptotically 
to a rate Rc defined by Eq. (|To|) . Although we do not know the eigenbasis in advance 
(by assumption), we now demonstrate that the above procedure can be used iteratively, to 
learn the eigenbasis of the source with arbitrarily small disturbance and hence achieve a 
fully universal compression scheme. 

Since we can compress the original sequence while maintaining arbitrarily high fidelity 
and arbitrarily low probability of error, we can repeat the process with a different compu- 
tational basis. We can continue iterating the compression-decompression process as often 
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as we choose; on each iteration the error probabihty and fidehty become worse, but still 
bounded toward zero and unity respectively with bounds determined by the value of Y and 
the number of iterations. For any given number of iterations we can always choose Y large 
enough to ensure that we meet any previously specified bounds. 

More precisely, for any given 5 > we may compress the source to S{p) + 5 qubits per 
signal as follows (and we simultaneously obtain an estimate of the eigenbasis). Let d be 
the dimension of the single signal space. The unitary group U{d) viewed as a subset of C^^ 
inherits the standard euclidean distance \Ui — [/2I and it acts transitively on the set of all 
orthonormal bases in the signal space. For any density matrix p and U G U{d) let H{U, p) 
be the condensation rate of Eq. ( [lO|) when the mismatch between the computational basis 
and eigenbasis is given by U. Thus H{U, p) is the Shannon entropy of the probability 
distribution pj given by the doubly stochastic transform of the eigenvalues Aj of p: 

Pj = ^ \Uij\'^\i 

i 

and H{I, p) = S{p) for any p. 

Now for each given p and 5 let T(/9, 5) be the largest real number such that 

\U-I\<T{p,5) implies \S{p) - H{U, p)\ < 5. 

Clearly from the definition, T{p,5) > for all p and 5 > 0, and if we allow p to vary over 
all density matrices then the quantity 

T{5) = minT(p, 6) 
p 

will be strictly positive. 

Next note that the group U{d) is compact so there exists a finite mesh M{5) of points 
{Vi} in U{d) with the property that for every U G U{d) there is a with — 1/^1 < 6 (and 
hence |C/ — Vi| < T{p,5) for any p). Let {Bi} be the corresponding set of bases obtained by 
applying the transformations of M{6) to the computational basis. Now given any source 
(with unknown density matrix p) , the above definitions will guarantee that the condensation 
rate H{Vi,p) (i.e. obtained by using Bi as the computational basis) will have the property 
that 

\S{p) — H(yi,p)\ < 6 for at least one i = io 
i.e. H{V^„p)<S{p) + 6. 

Thus by iteratively compressing and decompressing sequentially relative to the finite list 
of bases Bi we can choose the one giving the smallest condensation rate and hence compress 
the source to S{p) + 5 qubits per signal. We also learn the identity of the minimal basis 
which then provides an estimate of the eigenbasis. Depending on the (fixed, finite) size of 
the set Ad{6) we can choose the fidelity and error bounds sufficiently small (i.e. Y and n 
large enough) in each iteration to meet any prescribed bounds for the total process. 

7 Concluding remarks 

The argument presented here demonstrates that a pure quantum i.i.d. source can be op- 
timally compressed (i.e. compressed asymptotically to its von Neumann entropy) with no 
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prior knowledge of the structure of the source. The possibihty of such universal quantum 
compression has also been recently demonstrated by Hayashi and Matsumoto but there 
are significant differences in the two approaches. We introduced a diagonalisation procedure 
that enables any classical algorithm to be utilised in a quantum context. For any classical 
compression algorithm this gives a simple method of determining the effect of the resulting 
quantum algorithm on sources that are not diagonal in the computational basis. This leads 
to a simplified method of estimating the entropy of an i.i.d. source in terms of an associated 
length parameter, while maintaining high fidelity for sufficiently long blocks of signals. As 
a by-product, we also estimate the eigenbasis of the source. 

Our diagonalisation procedure may be applied to any classical algorithm and it may 
be interesting to explore its applicability in other cases (in addition to the Lempel-Ziv 
algorithm) and even beyond issues of information compression. The (not universal) quantum 
compression schemes of Schumacher |^ and Schumacher and Westmoreland @ may be 
viewed as similar translations of classical algorithms, but in those cases knowledge of the 
source makes it unnecessary to invoke the diagonalisation analysis. 

Our universal quantum compression scheme was based on a quantum implementation 
of the classical Lempel-Ziv algorithm. This classical algorithm is known to be applicable 
to a target sequence, such as literary text, that is not produced by an i.i.d. source. (Indeed 
the algorithm is widely used in practice for compressing computer files). Hence unlike any 
previously proposed quantum compression scheme, our scheme is also applicable to such 
more general target sequences of quantum states and it would be interesting to explore its 
performance for various kinds of non-i.i.d. sources. 

Intuitively, the classical Lempel-Ziv algorithm operates by building up and continually 
improving a model of the source, based on increasing numbers of already received signals, 
which may then be used to reduce the resources needed in further transmissions. Our 
quantum translation merely mimicks this procedure in the computational basis and so 
it would be especially interesting to investigate whether a "truly quantum" extension of 
the Lempel-Ziv idea exists, in the context of quantum information, that is not tied to a 
particular basis. 
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